Anomaly detection system and method

ABSTRACT

An anomaly detection system and method is provided. The system comprising: a hardware processor; and a memory storing instructions to configure the hardware processor, wherein the hardware processor receives a first time-series data comprising a first set of points and a second time-series data comprising a second set of points, computes a first set of error vectors for each point of the first set, and a second set of error vectors for each point of the second set, each set of error vectors comprising one or more prediction errors; estimates parameters based on the first set of error vectors comprising; applies (or uses) the parameters on the second set of error vectors; and detects an anomaly in the second time-series data when the parameters are applied on the second set of error vectors.

PRIORITY CLAIM

This U.S. patent application claims priority under 35 U.S.C. § 119 to:India Application No. 1516/MUM/2015, filed on Apr. 10, 2015. The entirecontents of the aforementioned application are incorporated herein byreference.

TECHNICAL FIELD

This disclosure relates generally to detecting anomalies, and, moreparticularly, to detecting an anomaly from time-series data.

BACKGROUND

Anomaly detection is a process of monitoring of objects such as humans,non-humans, other objects, etc., for the purpose of identifying unusualpatterns in behavior, activities, or other changing information. Ananomaly is usually detected from time-series data using several existingtechniques. Generally, the time-series data extracted using a sensorcomprise unique signal patterns corresponding to the anomaly.Traditionally, anomaly detection in time-series data involves usingprior knowledge of time window over which temporal analysis is done.Most anomaly detection techniques show poor performance when applied tounivariate or multivariate time-series data, since these techniquesrequire a pre-specified time window or data that needs to bepre-processed for these types of time-series data. Further, traditionalprocess monitoring techniques use statistical measures such ascumulative sum (CUSUM) and exponentially weighted moving average (EWMA)over a time window to detect changes in the underlying distribution. Thelength of this time window generally needs to be pre-determined, requireextensive data pre-processing and the results greatly depend on thistime window which in turn leads to degrading of the system performance.Current techniques implement prediction models to detect anomaly.However, these techniques do not incorporate inherent unpredictablepatterns such as abrupt braking of the vehicle, rapid rise/fall inacceleration/deceleration of the vehicle, etc.

SUMMARY

Embodiments of the present disclosure present technological improvementsas solutions to one or more of the above-mentioned technical problemsrecognized by the inventors in conventional systems.

In one aspect, an anomaly detection system is provided. The systemcomprising: one or more hardware processors; and a memory storinginstructions to configure the one or more hardware processors, whereinthe one or more hardware processors are configured by the instructionsto: receive a first time-series data comprising a first set of points,wherein the first set of points in the first time-series data is am-dimensional vector; compute an error vector for each point from thefirst set of points in the first time-series data to obtain a first setof error vectors, wherein each error vector from the first set of errorvectors comprises one or more prediction errors; estimate one or moreparameters based on the first set of error vectors comprising the one ormore prediction errors; receive a second time-series data comprising asecond set of points; compute an error vector for each point from thesecond set of points in the second time-series data to obtain a secondset of error vectors; apply (or use) the one or more parameters on thesecond set of error vectors; and detect an anomaly (or anomalies) in thesecond time-series data when the one or more parameters are applied (orused) on the second set of error vectors.

The one or more hardware processors are further configured by theinstructions to model at least one of the first set error vectors toobtain a multivariate Gaussian distribution. The one or more hardwareprocessors are further configured by the instructions to obtain one ormore likelihood values when the one or more parameters are applied onthe second set of error vectors, wherein the one or more likelihoodvalues are obtained for the second set of error vectors, wherein the oneor more parameters comprises at least one of mu (μ), sigma (Σ), and athreshold, wherein when at least one of the one or more likelihoodvalues is less than the threshold, the anomaly is detected in the secondtime-series data, and wherein the first time-series data and the secondtime-series data comprises at least one of a univariate time-series dataand a multivariate time-series data. The anomaly is detected based on aprediction model by using a long short term memory (LSTM) neuralnetwork.

A processor implemented anomaly detection method, comprising: receivinga first time-series data comprising a first set of points, wherein thefirst set of points in the first time-series data is a m-dimensionalvector; computing an error vector for each point from the first set ofpoints in the first time-series data to obtain a first set of errorvectors, wherein each error vector from the first set of error vectorscomprises one or more prediction errors; estimating one or moreparameters based on the first set of error vectors comprising the one ormore prediction errors; receiving a second time-series data comprising asecond set of points; computing an error vector for each point from thesecond set of points in the second time-series data to obtain a secondset of error vectors; applying the one or more parameters on the secondset of error vectors; and detecting an anomaly in the second time-seriesdata when the one or more parameters are applied on the second set oferror vectors.

The processor implement anomaly detection method further comprisingmodeling at least one of the first set of error vectors to obtain amultivariate Gaussian distribution; obtaining one or more likelihoodvalues when the one or more parameters are applied on the second set oferror vectors, wherein the one or more parameters comprises at least oneof mu (μ), sigma (Σ), and a threshold, and wherein the anomaly isdetected in the second time-series data when at least one of the one ormore likelihood values is less than the threshold.

The anomaly is detected based on a prediction model by using a longshort term memory (LSTM) neural network. The processor implementedanomaly detection method further comprises detecting one or moreanomalies in a third time-series data and a fourth time-series data byapplying the set of estimated parameters on (i) a third set of errorvectors corresponding the third time-series data and (ii) a fourth setof errors vectors corresponding the fourth time-series data, wherein thefirst time-series data, the second time-series data, the thirdtime-series data, the fourth time-series data comprises at least one ofa univariate time-series data and a multivariate time-series data.

In yet another aspect, one or more non-transitory machine readableinformation storage mediums comprising one or more instructions, whichwhen executed by one or more hardware processors causes an anomalydetection is provided. The anomaly is detected by performing the stepof: receiving a first time-series data comprising a first set of points,wherein the first set of points in the first time-series data is am-dimensional vector; computing an error vector for each point from thefirst set of points in the first time-series data to obtain a first setof error vectors, wherein each error vector from the first set of errorvectors comprises one or more prediction errors; estimating one or moreparameters based on the first set of error vectors comprising the one ormore prediction errors; receiving a second time-series data comprising asecond set of points; computing an error vector for each point from thesecond set of points in the second time-series data to obtain a secondset of error vectors; applying the one or more parameters on the secondset of error vectors; and detecting an anomaly in the second time-seriesdata when the one or more parameters are applied on the second set oferror vectors. The one or more parameters comprises of at least one of amu (μ), sigma (Σ), and a threshold. One or more likelihood value areobtained when the one or more parameters are applied on the second setof error vectors. When at least one of the one or more likelihood valuesis less than the predetermined threshold, the anomaly is detected.

It should be appreciated by those skilled in the art that any blockdiagram herein represent conceptual views of illustrative systemsembodying the principles of the present subject matter. Similarly, itwill be appreciated that any flow charts, flow diagrams, statetransition diagrams, pseudo code, and the like represent variousprocesses which may be substantially represented in computer readablemedium and so executed by a computing device or processor, whether ornot such computing device or processor is explicitly shown.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this disclosure, illustrate exemplary embodiments and, togetherwith the description, serve to explain the disclosed principles.

FIG. 1 illustrates a network implementation of an anomaly detectionsystem according to an embodiment of the present disclosure;

FIG. 2A illustrates a long-short term memory cell according to anembodiment of the present disclosure;

FIG. 2B Illustrates a stacked architecture of one or more hidden layersof an LSTM network according to an embodiment of the present disclosure;

FIG. 2C illustrates a table view of Precision, Recall and F0.1-Scoresfor RNN and LSTM Architectures according to an embodiment of the presentdisclosure;

FIG. 3 illustrates is a block diagram of the anomaly detection system ofFIG. 1 according to an embodiment of the present disclosure;

FIGS. 4A-4F illustrate a graphical representation of time-series datasequences samples received from one or more sensors for detecting one ormore events using the anomaly detection system of FIG. 1 according to anembodiment of the present disclosure; and

FIG. 5 is a flow diagram illustrating an anomaly detection method usingthe anomaly detection system of FIG. 1 according to an embodiment of thepresent disclosure.

DETAILED DESCRIPTION

Exemplary embodiments are described with reference to the accompanyingdrawings. In the figures, the left-most digit(s) of a reference numberidentifies the figure in which the reference number first appears.Wherever convenient, the same reference numbers are used throughout thedrawings to refer to the same or like parts. While examples and featuresof disclosed principles are described herein, modifications,adaptations, and other implementations are possible without departingfrom the spirit and scope of the disclosed embodiments. It is intendedthat the following detailed description be considered as exemplary only,with the true scope and spirit being indicated by the following claims.

An anomaly detection system and method is provided. The anomalydetection system receives a time-series data comprising one or morepoints related to an anomaly, wherein the time-series data comprises atleast one of a univariate time-series data and a multivariatetime-series data, compute an error vector for each of the one or morepoints in the time-series data to obtain error vectors, wherein eacherror vector comprises prediction errors, estimate one or moreparameters based on the error vectors, wherein the one or moreparameters is at least one of mu (μ) and sigma (Σ); and detect theanomaly based on the one or more parameters.

Referring now to the drawings, and more particularly to FIGS. 1 through5, where similar reference characters denote corresponding featuresconsistently throughout the figures, there are shown preferredembodiments and these embodiments are described in the context of thefollowing exemplary system and/or method.

FIG. 1 illustrates a network implementation 100 of an anomaly detectionsystem 102 according to an embodiment of the present disclosure. Theanomaly detection system 102 is communicatively coupled to a network 106which in turn is connected to a plurality of User Devices 104-1, 104-2,104-3 . . . 104-N, collectively referred to as the user devices 104 andindividually referred to as a user device 104. The user devices 104 maybe Implemented as any of a variety of conventional computing devices,including, for example, servers, a desktop PC, a notebook or portablecomputer, a workstation, a mainframe computer, an entertainment device,cellular phones, smart phones, personal digital assistants (PDAs),portable computers, desktop computers, tablet computers, phablets, andan internet appliance.

The anomaly detection system 102 is connected to the user devices 104over the network 106. The network 106 may be a wireless network, a wirednetwork, or a combination thereof. The network 106 can also be anindividual network or a collection of many such individual networks,interconnected with each other and functioning as a single largenetwork, e.g., the Internet or an intranet. The network 106 can beimplemented as one of the different types of networks, such as intranet,local area network (LAN), wide area network (WAN), the internet, andsuch. The network 106 may either be a dedicated network or a sharednetwork, which represents an association of the different types ofnetworks that use a variety of protocols. Further, the network 106 mayinclude network devices, such as network switches, hubs, routers, HBAs,for providing a communication link between the anomaly detection system102 and the user devices 104.

In one embodiment, the anomaly detection system 102 may facilitate thedetection of an anomaly from time-series data. The anomaly detectionsystem 102 may employ one or more sensors to capture the time-seriesdata. The time-series data comprises at least one of a univariatetime-series data and a multivariate time-series data. The univariatetime-series data as used herein refers to a time-series data comprisingone or more points, where each point is a unidimensional point (or onedimensional point). The multivariate time-series data as used hereinrefers to a time-series data comprising one or more points, where eachpoint is a multidimensional point. The anomaly detection system 102receives a first time-series data comprising a first set of points. Thefirst set of points in the first time-series data is an m-dimensionalvector, where ‘m’ is a natural number. Upon receiving the firsttime-series data, the anomaly detection system 102 computes an errorvector for each of the first set of points in the first time-series datato obtain a first set of error vectors. Each error vector from the firstset of error vectors comprises one or more prediction errors. Theanomaly detection system 100 estimates one or more parameters based onthe first set of error vectors comprising the one or more predictionerrors. The anomaly detection system 100 further receives a secondtime-series data comprising a second set of points, computes an errorvector for each point from the second set of points and obtain a secondset of error vectors (comprising one or more error vectors). Each errorvector from the second set of error vectors comprises one or moreprediction errors. The anomaly detection system 100 further applies theone or more parameters on the second set of error vectors. The anomalydetection system 100 then detects an anomaly in the second time-seriesdata when the one or more parameters are applied on the second set oferror vectors. The one or more parameters comprises but is not limitedto mu (μ), sigma (Σ), and a threshold (τ). The anomaly is detected inthe second time-series data by using a prediction model, where theanomaly detection system 102 learns the prediction model using one ormore stacked long short term memory (LSTM) neural networks, and thencomputes a prediction error distribution using which one or moreanomalies are detected, in one example embodiment.

The above methodology is described by way of an example below fordetecting an anomaly: For example, a time-series data X={x⁽¹⁾, x⁽²⁾, . .. , x^((n))} is received by the anomaly detection system 102 from theone or more sensors comprising but are not limited to pressure sensor,speed sensor, gear sensor, temperature sensor, measurement sensor,control sensor, Electrocardiogram (ECG) sensor, fuel sensor, actuationsensor, power consumption sensor, etc., where each point x^((t))∈R_(m)in the time-series is an m-dimensional vector {x₁ ^((t)), x₂ ^((t)), . .. , x_(m) ^((t))}, whose elements correspond to the input variables. Theanomaly detection system 102 implements a prediction model to learn andpredict the next ‘l’ values for ‘d’ of the input variables such that1≤d≤m. The normal sequence(s) are divided into one or more sets: normaltrain (s_(N)), normal validation-1 (v_(N1)), normal validation-2(v_(N2)), and normal test (t_(N)). Normal sequence refers to a regularpattern of time-series that indicates a normal behavior of a system. Theanomalous sequence(s) are divided into one or more sets: anomalousvalidation (v_(A)), and anomalous test (t_(A)). Anomalous sequencerefers to an irregular pattern of time-series that indicates an unusualbehavior of the system.

Although the present subject matter is explained considering that theanomaly detection system 102 is implemented for detecting the anomalyfrom the second time-series data (or subsequent time-series data), itmay be understood that the anomaly detection system 102 may also beimplemented in a variety of computing systems, such as a laptopcomputer, a desktop computer, a notebook, a workstation, a mainframecomputer, a server, a network server, a tablet, a mobile phone, and thelike. In one embodiment, the anomaly detection system 102 may beimplemented in a cloud-based environment.

With reference to FIG. 1, FIG. 2A illustrates a long-short term memorycell of the anomaly detection system 102 according to an embodiment ofthe present disclosure. Traditional process monitoring techniques usestatistical measures such as cumulative sum (CUSUM) and exponentiallyweighted moving average (EWMA) over a time window to detect changes inan underlying distribution. The length of this time window generallyneeds to be pre-determined and the results greatly depend on thisparameter. The LSTM neural network overcomes the vanishing gradientproblem experienced by recurrent neural networks (RNNs) by employingmultiplicative gates that enforce constant error flow through theinternal states of special units called ‘memory cells’. The gates: input(I_(G)) 202, forget (F_(G)) 204, and output (O_(G)) 206 prevent memorycontents from being perturbed by irrelevant inputs and outputs as shownin FIG. 2A, thereby allowing for long-term memory storage. Due to thisability to learn the long term correlations in a sequence, the LSTMnetworks obviate the need for a pre-specified time window and arecapable of accurately modelling complex multivariate sequences.

With reference to FIG. 1 through FIG. 2A, FIG. 2B illustrates a stackedarchitecture of one or more hidden layers of an LSTM network accordingto an embodiment of the present disclosure. It is evident from the FIG.2B that stacking recurrent hidden layers of sigmoidal (or LSTM)activation units in a network captures the structure of time-series dataand allows for processing time-series at different time scales. Anotable instance of using hierarchical temporal processing for anomalydetection is the Hierarchical Temporal Memory (HTM) system that attemptsto mimic the hierarchy of cells, regions, and levels in the neocortex.Also, temporal anomaly detection approaches learn to predict time-seriesand use prediction errors to detect anomaly.

A predictor is used to model normal behaviour, and subsequently use theprediction errors to identify abnormal behaviour. In order to ensurethat the networks capture the temporal structure of the sequence, one ormore time-series data are predicted into the future. Thus each point inthe time-series data has multiple corresponding predicted values made atdifferent points in the past, giving rise to multiple error values. Theprobability distribution of the errors made while predicting on normaldata is then used to obtain the probability of normal behaviour on thetest data. When control variables (such as vehicle accelerator or brake)are also present, the LSTM network is made to predict the controlvariable in addition to the dependent variables. This forces the LSTMnetwork to learn the normal usage patterns via the joint distribution ofthe prediction errors for the control and dependent sensor variables. Asa result, the obvious prediction errors made when a control inputchanges are already captured and do not contribute towards an anomalydetection.

With reference to FIG. 1 through FIG. 2B, FIG. 2C illustrates a tableview of Precision, Recall and F0.1-Scores for RNN and LSTM Architecturesaccording to an embodiment of the present disclosure. A stacked LSTMnetwork based prediction model is implemented. By way of an example, thefollowing embodiments are described herein, as such (30-20) hidden unitsis considered. (30-20) indicates 30 and 20 units in the 1st and 2ndhidden layers of the LSTM network. One unit in the input layercorresponds for each of the m dimensions, d×l units in the output layersuch that, there is one unit for each of the ‘l’ future predictions foreach of the ‘d’ dimensions. The LSTM units in a hidden layer areconnected through recurrent connections. The LSTM layers are stackedsuch that each unit in a lower LSTM hidden layer is connected to eachunit in the LSTM hidden layer above it through feedforward connectionsas depicted in FIG. 2B. The anomaly detection system 102 learns theprediction model using the sequence(s) in s_(N). The set v_(N1) is usedfor early stopping while learning the network weights.

With a prediction length of l, each of the selected d dimensions ofx^((t)) ∈X for l<t≤n−l is predicted ‘l’ times. An error vector e^((t))is computed for point x^((t)) as: [e₁₁ ^((t)), . . . , e_(1l) ^((t)), .. . , e_(d1) ^((t)), . . . , e_(dl) ^((t))], where e_(ij) ^((t)) is thedifference between x^((t)) and its value as predicted at time t−j. Theprediction model trained on s_(N) is used to compute the error vectorsfor each point in the validation and test sequences. The error vectorsare modelled to fit a multivariate Gaussian distribution N=N(μ,Σ). Thelikelihood value ‘p^((t))’ of observing an error vector e^((t)) is givenby the value of N at e^((t)). The error vectors for the points fromv_(N1) are used to estimate the parameters μ (a mean vector) and Σ(covariance matrix) using a Maximum Likelihood Estimation, in oneexample embodiment.

An observation x^((t)) is classified as ‘anomalous detection’ ifp^((t))<, else the observation is classified as ‘normal behaviordetection’. The sets v_(N2) and v_(A) are used to learn τ by maximizingF_(β)-Score (where anomalous points belong to positive class and normalpoints belong to negative class).

With reference to FIGS. 1 and 2C, FIG. 3 illustrates a block diagram ofthe anomaly detection system 102 of FIG. 1 according to an embodiment ofthe present disclosure. In one embodiment, the system 102 may include atleast one hardware processor 302, an input/output (I/O) interface 304,and a memory 306. The at least one processor 302 may be implemented asone or more microprocessors, microcomputers, microcontrollers, digitalsignal processors, central processing units, state machines, logiccircuitries, and/or any devices that manipulate signals based onoperational instructions. Further, the at least one processor 302 maycomprise a multi-core architecture. Among other capabilities, the atleast one processor 302 is configured to fetch and executecomputer-readable instructions or modules stored in the memory 306.

The I/O interface 304 may Include a variety of software and hardwareinterfaces, for example, a web interface, a graphical user interface,and the like. The I/O interface 304 may allow the system 102 to interactwith a user directly or through the user devices 104. Further, the I/Ointerface 304 may enable the system 102 to communicate with othercomputing devices, such as web servers and external data servers (notshown). The I/O interface 304 can facilitate multiple communicationswithin a wide variety of networks and protocol types, including wirednetworks, for example, LAN, cable, etc., and wireless networks, such asWLAN, cellular, or satellite. The I/O interface 304 may include one ormore ports for connecting a number of devices to one another or toanother server.

The memory 306 may include any computer-readable medium or computerprogram product known in the art including, for example, volatilememory, such as static random access memory (SRAM) and dynamic randomaccess memory (DRAM), and/or non-volatile memory, such as read onlymemory (ROM), erasable programmable ROM, flash memories, hard disks,optical disks, a compact disks (CDs), digital versatile disc or digitalvideo disc (DVDs) and magnetic tapes. The memory 306 may include the oneor more modules 308 as described.

The modules include routines, programs, objects, components, datastructures, etc., which perform particular tasks or implement particularabstract data types. In one implementation, the above describedembodiments and the methodology may be implemented and executed by usingthe modules 308 (or 308A-N). For example, the anomaly detection system102 comprises an error vector computation module that computes an errorvector for each of the one or more points (i) in the first time-seriesdata to obtain the first set of error vectors and (ii) in the secondtime-series data to obtain the second set of error vectors. Each errorvector from the first set and the second set of error vectors comprisesone or more prediction errors. The anomaly detection system 102 furthercomprises an estimation module that estimates one or more parameters(μ—mean vector, Σ—covariance matrix, and the threshold (τ) based on thefirst set of error vectors. The one or more parameters are then appliedon the second set of error vectors, based on which an anomaly isdetected in the second time-series data. The one or more parameters areapplied on the second set of error vectors to obtain one or morelikelihood values. The one or more likelihood values are then comparedwith the threshold (τ). When at least one of the one or more likelihoodvalues is less than the threshold (τ), the anomaly is detected. Whenl=1, mu (μ) and sigma (Σ) are numbers, in one example embodiment. Theanomaly detection system 102 may further comprise an anomaly detectionmodule that detects an anomaly by using on the one or more parameters.The anomaly detection system 102 may further comprise a predictionmodule that executes a prediction model to learn and predict the next‘l’ values for ‘d’ of the input variables such that 1≤d≤m as describedabove.

The anomaly detection system 102 further comprises a modeling modulethat models the one or more error vectors to obtain a multivariateGaussian distribution. The anomaly detection system 102 also comprises aclassification module that classifies an event as at least one of ananomaly detection or normal behavior detection based on a comparison ofthe likelihood value(s) with the threshold (τ) stored in the memory 306.The error vector computation module, the estimation module, the anomalydetection module, the modeling module, the classification module, andthe prediction module are implemented as a logically self-contained partof a software program that when executed perform the above methoddescribed herein, in one embodiment.

In another embodiment, the error vector computation module, theestimation module, the anomaly detection module, the modeling module,the classification module, and the prediction module are implemented asa self-contained hardware component. In yet another embodiment, theabove modules may be implemented as a self-contained hardware component,with a logically self-contained part of a software program embedded intoeach of the hardware component.

The system 102 may further comprise other modules that may includeprograms or coded instructions to supplement applications and functionsof the system 102. The memory 306 stores data, amongst other things, andserves as a repository for storing data processed, received, andgenerated by one or more of the modules. The data may also include asystem database, and other data 322. The other data may include datagenerated as a result of the execution of one or more modules in theother modules.

In one implementation, at first, the one or more sensors may bepositioned across an environment for detecting events related to theenvironment. For example, the sensors may be positioned across a spaceshuttle for capturing time-series data related to an event, based onwhich for deviations from normal behavior are detected. The time-seriesdata may comprise timestamp information, and/or data points.

The time-series data captured by the one or more sensors may be ofvariable lengths such that the time duration of the time-series data mayvary. Thus, the time-series data captured by the sensors (more than onesensor) may be used for determining the event related to the vehicle.The event may be determined accurately as the time-series data frommultiple sensors may be used.

With reference to FIGS. 1 through 3, FIGS. 4A-4F illustrate a graphicalrepresentation of time-series data sequences samples received from oneor more sensors for detecting one or more events using the anomalydetection system 102 of FIG. 1 according to an embodiment of the presentdisclosure. Sample sequences for the four datasets are shown in (A)-(F)with y-axes labeled ‘Sensor’ in 4(A)-4(E) and ‘Demand’ in 4(F). Normalbehavior parts are indicated as 402, and anomalous parts are indicatedas 404. The corresponding likelihood values from the error distributionare shown in log-scale with y-axes labeled ‘p’ along with the dashedlines showing the threshold (τ), and indicated as 406, and activationsequences of the hidden layers indicated as 408.

Referring to FIG. 4A, FIG. 4A illustrates qtdb/sel 102 ECG datasetcontaining a single short term anomaly corresponding to apre-ventricular contraction. Since the ECG dataset has only one anomaly,a threshold may not be calculated or configured, and correspondingF_(0.1)-Score for this dataset; the anomaly detection system 102 learnsthe prediction model using a normal ECG subsequence and computes thelikelihood of the error vectors for the remaining sequence from thetime-series.

FIG. 4B illustrates a space shuttle marotta valve time-series data set.This dataset has both short time period patterns and long time-periodpatterns that approximately last 100 s of time-steps. There are threeanomalous regions in the dataset marked as 404 (a1, a2, and a3) in FIG.4B. Region a3 is a more easily discernible anomaly, whereas regions a1and a2 correspond to more subtle anomalies that are not easilydiscernable at this resolution.

FIGS. 4C-4E show the original subsequences for the two dimensions beingpredicted (labeled ‘Sensor’ and ‘Control’) for Engine dataset, and thelikelihood values for two architectures. FIG. 4C shows sample normalbehavior of locomotive engine such as a motor—whereas FIGS. 4(D) and4(E) show two different instances of faulty engines. Plots with sameS_(i) (i=1, 2, 3) have same y-axis scale. FIG. 4C-4E illustrates amulti-sensor engine dataset. This dataset has readings from 12 differentsensors. One of the sensors is the ‘control’ sensor to the engine thatmeasures control variables, and the rest of the sensors measuredependent variables like temperature, torque, and so on. The anomalydetection system 102 is first trained using normal sequences s_(N) (tolearn the prediction model) corresponding to one or more independentfaults and F_(β)-Score is measured on a distinct set of the one or moreindependent faults. In other words, the system 102 is first trainedusing normal sequences s_(N) (to learn the prediction model), and thenthe threshold is computed by maximizing the F_(β)-Score using normalv_(N2) and fault (anomalous) v_(A) sequences. The ‘control’ sensor ischosen together with one of the dependent variables as the dimensions tobe predicted.

FIG. 4F illustrates a power demand dataset. The normal behaviorcorresponds to weeks where the power consumption has five peakscorresponding to the five weekdays and two troughs corresponding to theweekend. This dataset has a very long term pattern spanning hundreds oftime steps. Additionally, the data is noisy because the peaks do notoccur exactly at the same time of the day. (f.1) and (f.2) showactivation sequences for selected LSTM hidden units for lower (LSTM-L1)and higher (LSTM-L2) hidden layer respectively.

The key observations from the above experimental results indicate thefollowing:

-   (i) In FIGS. 4A and 4E, the likelihood values p^((t)) are    significantly lower in the anomalous regions than the normal regions    for all datasets. Further, the p^((t)) values do not remain low    throughout the anomalous regions. β<<1 (0.1) is deliberately used so    as to give a higher importance to precision over recall. It is to be    noted that although all points in an anomalous subsequence have a    label of ‘anomalous’, but in practice, there may be a plurality of    points of ‘normal’ behavior even amongst these points. So it    suffices that a significant percentage of the points in an    ‘anomalous’ subsequence are predicted as anomalous. The values of τ    obtained (represented by dashed lines in the p^((t)) plots in FIGS.    4A and 4F) suggest F_(β)-Score (as described in FIG. 2) to be a    suitable metric for the datasets considered.-   (ii) The positive likelihood ratio (true positive rate to false    positive rate) has been found to be high (more than 34.0) for all    the datasets. High positive likelihood ratio value suggests that the    probability of reporting an anomaly in anomalous region is much    higher than the probability of reporting an anomaly in normal    region.-   (iii) The activations of selected hidden units, four each from    layers LSTM-L1 (lower hidden layer with 30 units) and LSTM-L2    (higher hidden layer with 20 units) for the power dataset are shown    in FIG. 4 (f.1) and (f.2). Subsequences marked w₁ and w₂ in the last    activation sequence shown in FIG. 4 (f.2) indicate that this hidden    unit activation is high during the weekdays and low during weekends.    These are instances of high-level features being learned by the    higher hidden layer, which appear to be operating at a weekly    time-scale.-   (iv) As shown in FIG. 2C, for the ‘ECG’ and ‘engine’ datasets, which    do not have any long-term temporal dependence, both Long-short term    memory-anomaly detection (LSTM-AD) and Recurrent neural    network-anomaly detection (RNN-AD) perform equally well. LSTM-AD    refers to an anomaly detection using a long-short term memory neural    network. RNN-AD refers to an anomaly detection using a recurrent    neural network with sigmoidal units in the hidden layers. On the    other hand, for ‘space shuttle’ and ‘power demand’ datasets which    have long-term temporal dependencies along with short-term    dependencies, LSTM-AD shows significant improvement of 18% and 30%    respectively over RNN-AD in terms of F_(0.1)-Score.-   (v) The fraction of anomalous points detected for periods prior to    faults for the ‘engine’ dataset is higher than that during normal    operation. This suggests that the embodiments and/or the methodology    described herein may be implemented for early fault prediction as    well.

With reference to FIGS. 1 through 4, FIG. 5 is a flow diagramillustrating an anomaly detection method using the anomaly detectionsystem 102 of FIG. 1 according to an embodiment of the presentdisclosure. In step 502, a first time-series data comprising a first setof points is received. The first set of points in the time-series datais a m-dimensional vector, where ‘m’ is a natural number ranging from 1to n. In step 504, an error vector is computed for each of the first setof points in the first time-series data to obtain a first set of errorvectors. Each of the first set of error vectors comprises one or moreprediction errors. In step 506, one or more parameters are estimatedbased on the first set of error vectors to obtain a set of estimatedparameters. The set of estimated parameters is at least one of mu (μ),sigma (Σ), the threshold (τ). In step 508, a second time-series datacomprising a second set of points is received. The second set of pointsin the second time-series data is a m-dimensional vector. In step 510,an error vector for each of the second set of points in the secondtime-series data is computed to obtain a second set of error vectors. Instep 512, the set of estimated parameters are applied on the second setof error vectors of the second time-series data. Each error vector inthe second set of error vectors comprises one or more prediction errors.In step 514, an anomaly detected in the second time-series data when theset of estimated parameters are applied on the second set of errorvectors. More specifically, when the set of estimated parameters areapplied on the second set of error vectors, one or more likelihoodvalues are obtained. Any combination of parameters from the set ofestimated parameters may be applied (or used) on the second set of errorvectors of the second time-series data, in one example embodiment. Theanomaly is detected when at least one of the one or more likelihoodvalues is less than the threshold. At least one of the first set oferror vectors is modelled to obtain a multivariate Gaussiandistribution. The anomaly is detected based on a prediction model byusing a long short term memory (LSTM) neural network. The firsttime-series data and the second time-series data comprises at least oneof a univariate time-series data and a multivariate time-series data.

During a learning phase, the LSTM based prediction model, the Gaussianparameters mu (μ) and sigma (Σ), the threshold (τ) are learnt. All ofthese are then used for anomaly detection, i.e., to classify the pointsin a new time-series (or subsequent time-series data) as normal oranomalous. The prediction model is used to get error vectors which arethen used to obtain the likelihood values using mu and sigma learntusing the training phase. If the likelihood is lower than τ learntduring the learning phase, the point is classified as anomalous else itis classified as normal. In other words, the anomaly detection system102 uses the same set of estimated parameters mu (μ) and sigma (Σ), thethreshold (τ) on a third received time-series data, and a fourthreceived time-series data. For example, when the third time-series datacomprising a third set of points is received by the anomaly detectionsystem 102, the anomaly detection system 102 computes an error vectorfor each of points in the third set of points to obtain a third set oferror vectors. The set of estimated parameters mu (μ) and sigma (Σ), thethreshold (τ) are applied or used on the third set of error vectors toobtain a set of likelihood values (also referred as a third set oflikelihood values) corresponding (or specific) to the third time-seriesdata. One or more anomalies are detected in the third time-series datawhen the set of estimated parameters mu (μ) and sigma (Σ), the threshold(τ) are applied or used on the third set of error vectors. Morespecifically, when at least one of the set of likelihood values obtainedis less than the threshold (τ), the one or more anomalies are detectedin the third time-series data. Likewise, when the fourth time-seriesdata comprising a fourth set of points is received by the anomalydetection system 102, the anomaly detection system 102 computes an errorvector for each of points in the fourth set of points to obtain a fourthset of error vectors. The set of estimated parameters mu (μ) and sigma(Σ), the threshold (τ) are applied or used on the fourth set of errorvectors to obtain a set of likelihood values (also referred as a fourthset of likelihood values) corresponding (or specific) to the fourthtime-series data. One or more anomalies are detected in the fourthtime-series data when the set of estimated parameters mu (μ) and sigma(Σ), the threshold (τ) are applied or used on the fourth set of errorvectors. More specifically, when at least one of the fourth set oflikelihood values obtained is less than the threshold (τ), the one ormore anomalies are detected in the fourth time-series data, and so on.Like, the first time-series data and the second time-series data, thethird time-series data and the fourth time-series data comprises atleast one of the univariate time-series data and the multivariatetime-series data.

The written description describes the subject matter herein to enableany person skilled in the art to make and use the embodiments and/or themethodology described herein. The scope of the subject matterembodiments is defined by the claims and may include other modificationsthat occur to those skilled in the art. Such other modifications areintended to be within the scope of the claims if they have similarelements that do not differ from the literal language of the claims orif they include equivalent elements with insubstantial differences fromthe literal language of the claims.

It is, however to be understood that the scope of the protection isextended to such a program and in addition to a computer-readable meanshaving a message therein; such computer-readable storage means containprogram-code means for implementation of one or more steps of themethod, when the program runs on a server or mobile device or anysuitable programmable device. The hardware device can be any kind ofdevice which can be programmed including e.g., any kind of computer likea server or a personal computer, or the like, or any combinationthereof. The device may also include means which could be e.g., hardwaremeans like e.g., an application-specific integrated circuit (ASIC), afield-programmable gate array (FPGA), or a combination of hardware andsoftware means, e.g., an ASIC and an FPGA, or at least onemicroprocessor and at least one memory with software modules locatedtherein. Thus, the means can include both hardware means and softwaremeans. The method embodiments described herein could be implemented inhardware and software. The device may also include software means.Alternatively, the invention may be implemented on different hardwaredevices, e.g., using a plurality of CPUs.

The embodiments herein can comprise hardware and software elements. Theembodiments that are implemented in software include but are not limitedto, firmware, resident software, microcode, etc. The functions performedby various modules described herein may be implemented in other modulesor combinations of other modules. For the purposes of this description,a computer-usable or computer readable medium can be any apparatus thatcan comprise, store, communicate, propagate, or transport the programfor use by or in connection with the instruction execution system,apparatus, or device.

The medium can be an electronic, magnetic, optical, electromagnetic,infrared, or semiconductor system (or apparatus or device) or apropagation medium. Examples of a computer-readable medium include asemiconductor or solid state memory, magnetic tape, a removable computerdiskette, a random access memory (RAM), a read-only memory (ROM), arigid magnetic disk and an optical disk. Current examples of opticaldisks include compact disk-read only memory (CD-ROM), compactdisk-read/write (CD-RNV) and DVD.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution.

Input/output (I/O) devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening I/O controllers. Network adapters mayalso be coupled to the system to enable the data processing system tobecome coupled to other data processing systems or remote printers orstorage devices through intervening private or public networks. Modems,cable modem and Ethernet cards are just a few of the currently availabletypes of network adapters.

A representative hardware environment for practicing the embodiments mayinclude a hardware configuration of an information handling/computersystem in accordance with the embodiments herein. The system hereincomprises at least one processor or central processing unit (CPU). TheCPUs are interconnected via system bus to various devices such as arandom access memory (RAM), read-only memory (ROM), and an input/output(I/O) adapter. The I/O adapter can connect to peripheral devices, suchas disk units and tape drives, or other program storage devices that arereadable by the system. The system can read the inventive instructionson the program storage devices and follow these instructions to executethe methodology of the embodiments herein. (We can even illustrate thisby a representative computer architecture in drawings.)

The system further includes a user interface adapter that connects akeyboard, mouse, speaker, microphone, and/or other user interfacedevices such as a touch screen device (not shown) to the bus to gatheruser input. Additionally, a communication adapter connects the bus to adata processing network, and a display adapter connects the bus to adisplay device which may be embodied as an output device such as amonitor, printer, or transmitter, for example.

The anomaly detection system 102 implements stacked LSTM networks thatare able to learn higher level temporal patterns without prior knowledgeof the pattern duration, and so the stacked LSTM networks may be aviable technique to model normal time-series behavior, which can then beused to detect anomalies. The anomaly detection system 102 implements anLSTM-AD technique datasets which involve modelling small-term as well aslong-term temporal dependencies. In other words, dependencies amongdifferent dimensions of a multivariate time-series data can be learnt byLSTM network which enables to learn normal behavior in an efficientmanner and hence detect anomalous behavior more accurately. As can bedepicted from the FIG. 2, the table Illustrates a comparison ofexperimental results of LSTM-AD technique and RNN-AD, which suggeststhat LSTM based prediction models may be more robust when compared toRNN based models, especially when there is no priori information whetherthe normal behavior involves long-term dependencies or not. Unlikeconventional anomaly detection system and method, the anomaly detectionsystem 102 does not require any feature engineering or pre-processing.The proposed embodiments enable the anomaly detection system 102 toeasily capture long term correlations in time-series using LSTMs whichimproves anomaly detection. The anomaly detection system 102 uses the‘normal data’ to learn the prediction model without any need ofanomalous data. The proposed embodiments described and implemented bythe anomaly detection system 102 can be leveraged but not limited to, in(i) Internet of Things (IoT) setting to detect anomalous behavior, (ii)fault detection in manufacturing domain, monitoring health, etc.

The preceding description has been presented with reference to variousembodiments. Persons having ordinary skill in the art and technology towhich this application pertains will appreciate that alterations andchanges in the described structures and methods of operation can bepracticed without meaningfully departing from the principle, spirit andscope.

What is claimed is:
 1. A processor implemented anomaly detection methodcomprising one or more hardware processors coupled with a memory for:receiving a first time-series data comprising a first set of points,wherein said first set of points in said first time-series data is am-dimensional vector; computing an error vector for each point from saidfirst set of points in said first time-series data to obtain a first setof error vectors, wherein each error vector from said first set of errorvectors comprises one or more prediction errors; estimating one or moreparameters based on said first set of error vectors to obtain a set ofestimated parameters; receiving a second time-series data comprising asecond set of points; computing an error vector for each point from saidsecond set of points in said second time-series data to obtain a secondset of error vectors; applying said set of estimated parameters on saidsecond set of error vectors; and detecting an anomaly in said secondtime-series data when said set of estimated parameters are applied onsaid second set of error vectors, wherein said anomaly is detected basedon a prediction model by using a long short term memory (LSTM) neuralnetwork.
 2. The method of claim 1, further comprising modeling at leastone of said first set error vectors to obtain a multivariate Gaussiandistribution.
 3. The method of claim 1, further comprising obtaining oneor more likelihood values when said set of estimated parameters areapplied on said second set of error vectors.
 4. The method of claim 3,wherein said set of estimated parameters comprises at least one of mu(μ), sigma (Σ), and a threshold.
 5. The method of claim 4, wherein saidanomaly is detected in said second time-series data when at least one ofsaid one or more likelihood values is less than said threshold.
 6. Themethod of claim 1, further comprising detecting an anomaly in a thirdtime-series data and a fourth time-series data by applying the set ofestimated parameters on (i) a third set of error vectors correspondingthe third time-series data and (ii) a fourth set of errors vectorscorresponding the fourth time-series data, wherein said firsttime-series data, said second time-series data, said third time-seriesdata, said fourth time-series data comprises at least one of aunivariate time-series data and a multivariate time-series data.
 7. Ananomaly detection system comprising: one or more hardware processors;and a memory storing instructions to configure the one or more hardwareprocessors, wherein the one or more hardware processors are configuredby the instructions to: receive a first time-series data comprising afirst set of points, wherein said first set of points in said firsttime-series data is a m-dimensional vector, compute an error vector foreach point from said first set of points in said first time-series datato obtain a first set of error vectors, wherein each error vector fromsaid first set of error vectors comprises one or more prediction errors;estimate one or more parameters based on said first set of error vectorsto obtain a set of estimated parameters; receive a second time-seriesdata comprising a second set of points; compute an error vector for eachpoint from said second set of points in said second time-series data toobtain a second set of error vectors; apply said set of estimatedparameters on said second set of error vectors; and detect an anomaly insaid second time-series data when said set of estimated parameters areapplied on said second set of error vectors, wherein said anomaly isdetected based on a prediction model by using a long short term memory(LSTM) neural network.
 8. The system of claim 7, wherein said one ormore hardware processors are further configured by the instructions tomodel at least one of said first set error vectors to obtain amultivariate Gaussian distribution.
 9. The system of claim 7, whereinsaid one or more hardware processors are further configured by theinstructions to obtain one or more likelihood values when said set ofestimated parameters are applied on said second set of error vectors.10. The system of claim 7, wherein said set of estimated parameterscomprises at least one of mu (μ), sigma (Σ), and a threshold.
 11. Thesystem of claim 10, wherein when at least one of said one or morelikelihood values is less than said threshold, said anomaly is detectedin said second time-series data, and wherein said first time-series dataand said second time-series data comprises at least one of a univariatetime-series data and a multivariate time-series data.
 12. One or morenon-transitory machine readable information storage mediums comprisingone or more instructions, which when executed by one or more hardwareprocessors causes an anomaly detection by performing the step of:receiving a first time-series data comprising a first set of points,wherein said first set of points in said first time-series data is am-dimensional vector; computing an error vector for each point from saidfirst set of points in said first time-series data to obtain a first setof error vectors, wherein each error vector from said first set of errorvectors comprises one or more prediction errors; estimating one or moreparameters based on said first set of error vectors to obtain a set ofestimated parameters; receiving a second time-series data comprising asecond set of points; computing an error vector for each point from saidsecond set of points in said second time-series data to obtain a secondset of error vectors; applying said set of estimated parameters on saidsecond set of error vectors; and detecting an anomaly in said secondtime-series data when said set of estimated parameters are applied onsaid second set of error vectors, wherein said anomaly is detected basedon a prediction model by using a long short term memory (LSTM) neuralnetwork.
 13. The one or more non-transitory machine readable informationstorage mediums of claim 12, wherein said set of estimated parameterscomprises of at least one of a mu (μ), sigma (Σ), and a threshold,wherein one or more likelihood value are obtained when said set ofestimated parameters are applied on said second set of error vectors,and wherein when at least one of said one or more likelihood values isless than said predetermined threshold, said anomaly is detected.